Storing data in a scalable web application can be tricky. A user could be interacting with any of dozens of web servers at a given time, and the user's next request could go to a different web server than the previous request. All web servers need to be interacting with data that is also spread out across dozens of machines, possibly in different locations around the world.
With Google App Engine, you don't have to worry about any of that. App Engine's infrastructure takes care of all of the distribution, replication, and load balancing of data behind a simple API—and you get a powerful query engine and transactions as well.
App Engine's data repository, the High Replication Datastore (HRD), uses the Paxos algorithm to replicate data across multiple data centers. Data is written to the Datastore in objects known as entities. Each entity has a key that uniquely identifies it. An entity can optionally designate another entity as its parent; the first entity is a child of the parent entity. The entities in the Datastore thus form a hierarchically structured space similar to the directory structure of a file system. An entity's parent, parent's parent, and so on recursively, are its ancestors; its children, children's children, and so on, are its descendants. An entity without a parent is a root entity.
The Datastore is extremely resilient in the face of catastrophic failure, but its consistency guarantees may differ from what you're familiar with. Entities descended from a common ancestor are said to belong to the same entity group; the common ancestor's key is the group's parent key, which serves to identify the entire group. Queries over a single entity group, called ancestor queries, refer to the parent key instead of a specific entity's key. Entity groups are a unit of both consistency and transactionality: whereas queries over multiple entity groups may return stale, eventually consistent results, those limited to a single entity group always return up-to-date, strongly consistent results.
The code samples in this guide organize related entities into entity groups, and use ancestor queries on those entity groups to return strongly consistent results. In the example code comments, we highlight some ways this might affect the design of your application. For more detailed information, see Structuring Data for Strong Consistency.
Note: If you built your application using an earlier version of this Getting Started Guide, please note that the sample application has changed. You can still find the sample code for the original Guestbook application, which does not use ancestor queries, in the demos directory of the SDK.
The Datastore is one of several App Engine services offering a choice of standards-based or low-level APIs. The standards-based APIs decouple your application from the underlying App Engine services, making it easier to port your application to other hosting environments and other database technologies, if you ever need to. The low-level APIs expose the service's capabilities directly; you can use them as a base on which to implement new adapter interfaces, or just use them directly in your application.
App Engine includes support for two different API standards for the Datastore: Java Data Objects (JDO) and the Java Persistence API (JPA). These interfaces are provided by DataNucleus Access Platform, an open-source implementation of several Java persistence standards, with an adapter for the App Engine Datastore.
For clarity getting started, we'll use the low-level API to retrieve and post messages left by users.
Here is an updated version of src/SignGuestbookServlet.java that stores greetings in the Datastore. We will discuss the changes made here below.
package guestbook;
import com.google.appengine.api.datastore.DatastoreService;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.Key;
import com.google.appengine.api.datastore.KeyFactory;
import com.google.appengine.api.users.User;
import com.google.appengine.api.users.UserService;
import com.google.appengine.api.users.UserServiceFactory;
import java.io.IOException;
import java.util.Date;
import java.util.logging.Logger;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class SignGuestbookServlet extends HttpServlet {
private static final Logger log =
Logger.getLogger(SignGuestbookServlet.class.getName());
public void doPost(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
UserService userService = UserServiceFactory.getUserService();
User user = userService.getCurrentUser();
// We have one entity group per Guestbook with all Greetings residing
// in the same entity group as the Guestbook to which they belong.
// This lets us run an ancestor query to retrieve all Greetings for a
// given Guestbook. However, the write rate to each Guestbook should be
// limited to ~1/second.
String guestbookName = req.getParameter("guestbookName");
Key guestbookKey = KeyFactory.createKey("Guestbook", guestbookName);
String content = req.getParameter("content");
Date date = new Date();
Entity greeting = new Entity("Greeting", guestbookKey);
greeting.setProperty("user", user);
greeting.setProperty("date", date);
greeting.setProperty("content", content);
DatastoreService datastore =
DatastoreServiceFactory.getDatastoreService();
datastore.put(greeting);
resp.sendRedirect("/guestbook.jsp?guestbookName="
+ guestbookName);
}
}
The low-level Datastore API for Java provides a schemaless interface for creating and storing entities. The low-level API does not require entities of the same kind to have the same properties, nor for a given property to have the same type for different entities. The following code snippet constructs the Greeting entity in the same entity group as the guestbook to which it belongs:
Entity greeting = new Entity("Greeting", guestbookKey);
greeting.setProperty("user", user);
greeting.setProperty("date", date);
greeting.setProperty("content", content);
In our example, each Greeting has the posted content, and also stores the user information about who posted, and the date on which the post was submitted. When initializing the entity, we supply the entity name, Greeting, as well as a guestbookKey argument that sets the parent of the entity we are storing. Objects in the Datastore that share a common ancestor belong to the same entity group.
After we construct the entity, we instantiate the Datastore service, and put the entity in the Datastore:
DatastoreService datastore =
DatastoreServiceFactory.getDatastoreService();
datastore.put(greeting);
Because querying in the High Replication Datastore is strongly consistent only within entity groups, we assign all Greetings to the same entity group by setting the same parent for each Greeting. This means a user will always see a Greeting immediately after it was written. However, the rate at which you can write to the same entity group is limited to 1 write to the entity group per second. When you design a real application, you'll need to keep this fact in mind. Note that by using services such as Memcache, you can mitigate the chance that a user won't see fresh results when querying across entity groups immediately after a write.
We also need to modify the JSP we wrote earlier to display Greetings from the Datastore, and also include a form for submitting Greetings. Here is our updated guestbook.jsp:
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<%@ page import="java.util.List" %>
<%@ page import="com.google.appengine.api.users.User" %>
<%@ page import="com.google.appengine.api.users.UserService" %>
<%@ page import="com.google.appengine.api.users.UserServiceFactory" %>
<%@ page import="com.google.appengine.api.datastore.DatastoreServiceFactory" %>
<%@ page import="com.google.appengine.api.datastore.DatastoreService" %>
<%@ page import="com.google.appengine.api.datastore.Query" %>
<%@ page import="com.google.appengine.api.datastore.Entity" %>
<%@ page import="com.google.appengine.api.datastore.FetchOptions" %>
<%@ page import="com.google.appengine.api.datastore.Key" %>
<%@ page import="com.google.appengine.api.datastore.KeyFactory" %>
<%@ taglib prefix="fn" uri="http://java.sun.com/jsp/jstl/functions" %>
<html>
<head>
<link type="text/css" rel="stylesheet" href="/stylesheets/main.css" />
</head>
<body>
<%
String guestbookName = request.getParameter("guestbookName");
if (guestbookName == null) {
guestbookName = "default";
}
pageContext.setAttribute("guestbookName", guestbookName);
UserService userService = UserServiceFactory.getUserService();
User user = userService.getCurrentUser();
if (user != null) {
pageContext.setAttribute("user", user);
%>
<p>Hello, ${fn:escapeXml(user.nickname)}! (You can
<a href="<%= userService.createLogoutURL(request.getRequestURI()) %>">sign out</a>.)</p>
<%
} else {
%>
<p>Hello!
<a href="<%= userService.createLoginURL(request.getRequestURI()) %>">Sign in</a>
to include your name with greetings you post.</p>
<%
}
%>
<%
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Key guestbookKey = KeyFactory.createKey("Guestbook", guestbookName);
// Run an ancestor query to ensure we see the most up-to-date
// view of the Greetings belonging to the selected Guestbook.
Query query = new Query("Greeting", guestbookKey).addSort("date", Query.SortDirection.DESCENDING);
List<Entity> greetings = datastore.prepare(query).asList(FetchOptions.Builder.withLimit(5));
if (greetings.isEmpty()) {
%>
<p>Guestbook '${fn:escapeXml(guestbookName)}' has no messages.</p>
<%
} else {
%>
<p>Messages in Guestbook '${fn:escapeXml(guestbookName)}'.</p>
<%
for (Entity greeting : greetings) {
pageContext.setAttribute("greeting_content",
greeting.getProperty("content"));
if (greeting.getProperty("user") == null) {
%>
<p>An anonymous person wrote:</p>
<%
} else {
pageContext.setAttribute("greeting_user",
greeting.getProperty("user"));
%>
<p><b>${fn:escapeXml(greeting_user.nickname)}</b> wrote:</p>
<%
}
%>
<blockquote>${fn:escapeXml(greeting_content)}</blockquote>
<%
}
}
%>
<form action="/sign" method="post">
<div><textarea name="content" rows="3" cols="60"></textarea></div>
<div><input type="submit" value="Post Greeting" /></div>
<input type="hidden" name="guestbookName" value="${fn:escapeXml(guestbookName)}"/>
</form>
</body>
</html>
Warning! Whenever you display user-supplied text in HTML, you must escape the string using the fn:escapeXml JSTL function, or a similar escaping mechanism. If you do not correctly and consistently escape user-supplied data, the user could supply a malicious script as text input, causing harm to later visitors.
The low-level Java API provides a Query class for constructing queries and a PreparedQuery class for fetching and returning the entities that match the query from the Datastore. The code that fetches the data is here:
Query query = new Query("Greeting", guestbookKey).addSort("date", Query.SortDirection.DESCENDING);
List<Entity> greetings = datastore.prepare(query).asList(FetchOptions.Builder.withLimit(5));
This code creates a new query on the Greeting entity, and sets the guestbookKey as the required parent entity for all entities that will be returned. We also sort on the date property, returning the newest Greeting first.
After you construct the query, it is prepared and returned as a list of Entity objects. For a description of the Query and PreparedQuery interfaces, see the Datastore reference.
Every query in the App Engine Datastore is computed from one or more indexes. Indexes are tables that map ordered property values to entity keys. This is how App Engine is able to serve results quickly regardless of the size of your application's Datastore. Many queries can be computed from the builtin indexes, but the Datastore requires you to specify a custom index for some, more complex, queries. Without a custom index, the Datastore can't execute the query efficiently.
Our guest book example above, which filters results by ancestor and orders by date, uses an ancestor query and a sort order. This query requires a custom index to be specified in your application's datastore-indexes.xml file. When you run your application in the SDK, it will automatically add an entry to this file. When you upload your application, the custom index definition will be automatically uploaded, too. The entry for this query will look like:
<?xml version="1.0" encoding="utf-8"?>
<datastore-indexes
autoGenerate="true">
<datastore-index kind="Greeting" ancestor="true">
<property name="date" direction="desc" />
</datastore-index>
</datastore-indexes>
You can read all about Datastore indexes in the Introduction to Indexes section.
The development web server uses a local version of the Datastore for testing your application, using local files. The data persists as long as the temporary files exist, and the web server does not reset these files unless you ask it to do so.
The file is named local_db.bin, and it is created in your application's WAR directory, in the WEB-INF/appengine-generated/ directory. To clear the Datastore, delete this file.
Every web application returns dynamically generated HTML from the application code, via templates or some other mechanism. Most web applications also need to serve static content, such as images, CSS stylesheets, or JavaScript files. For efficiency, App Engine treats static files differently from application source and data files. Let's create a CSS stylesheet for this application, as a static file.
Continue to Using Static Files.